PHP Web Site Generation using Ruby

by Eric Rollins

this article is published on the Code Generation Network web site www.codegeneration.net

Introduction

"Raising the level of abstraction means moving toward WHAT, not HOW - telling the system what we want to do, declaratively, instead of how to do it, procedurally. This trend is desirable because declarative means the system does the work, while procedural means the user does the work." C.J. Date, What Not How, 2000.

In Chapter 10 Database Access of Code Generation in Action Jack Herrington and I presented a code generator written in Ruby that generated a SQL schema and Enterprise Java Beans database access tier based on an input XML schema description. For this article I have modified the Ruby code to generate PHP code, and extended it to generate production PHP/HTML web pages.

I will cover building a database access layer for PHP, but I will also focus on the numerous benefits of generating production web pages using a system that has complete knowledge of the database schema. To finish I will discuss the highly productive development style that is enabled by continuously extending the declarative grammar in your own generator, and then provide some conclusions.

The Ruby source code and other files are available here.

I will not be explaining the Ruby source code; a detailed explanation is presented in Chapter 10 of the book.

And code generation provides many other advantages I do not discuss; please see the book and the Code Generation Network website FAQ for more.

Test Case

Our test case is an application that manages book publishing. It consists of 5 tables:

Book
bookID title ISBN authorID publisherID status numCopies
100 Object Oriented Perl 1-884777-79-1 100 100 2 1
101 Bitter Java 1-930110-43-X 101 100 2 1

Author
authorID name penName
100 Conway
101 Tate

Publisher
publisherID name
100 Manning

Store
storeID name
100 Borders

StoreBook
storeID bookID quantity
100 100 45
100 101 399

Generator Architecture

Our generator will take four XML files as input and using a series of templates will build a series of SQL and PHP files. The diagram below shows this relationship:

The four input files are:

schema.xml Describes the database tables, columns, column datatypes, and foreign key relationships.
extensions.xml Describes extended Value Objects, queries, and methods beyond those automatically generated from schema.xml.
pages.xml Describes production list, add, update, and delete web pages which access tables described in schema.xml.
samples.xml Describes sample data that is placed in a Tests.php web page to be automatically loaded via the generated PHP APIs.

The output files are:

tables.sql SQL script that creates tables and foreign key relationships.
code/*SS.php Database access layer (class) for table *; provides add, update, delete, get, getAll, and custom queries and methods.
code/*Value.php Value Object class for table *; used to pass data between web pages and *SS layer.
tests/Tests.php PHP web page used to load sample data described in samples.xml.
webtest/*Add.php, *Update.php, *Delete.php, *ValueList.php Test PHP web pages automatically generated for table *.
web/*.php Production PHP web pages specified in pages.xml.

Database Access Architecture

The PHP web pages do not access the database directly. Instead they pass Value Objects to the SS layer, which in turn calls the PHP PEAR DB database abstraction layer API. This is diagrammed below:

The web browser talks to the PHP page on the Apache web server. The web page creates (or requests) a Value Object. The Value Object is passed to the SS layer, where it is transformed into a SQL statement passed to the PEAR database abstraction layer API, and on to the database.

PHP Generator

Now that we have completed a high level overview, lets drill into the generator itself. We first start with the lowest level, the schema.

Schema

The fundamental input to the generator is the schema.xml file. A fragment is show below:
  <table name="Book">
    <column name="bookID" datatype="integer" not-null="true"
      primary-key="true" />
    <column name="title" datatype="varchar" length="80" not-null="true" />
    <column name="ISBN" datatype="varchar" length="80" not-null="true"
      unique="true" />
    <column name="authorID" datatype="integer" not-null="true" />
    <column name="publisherID" datatype="integer" not-null="true" />
    <column name="status" datatype="integer" not-null="true" />
    <column name="numCopies" datatype="integer" not-null="true" />
  </table>

  <foreign-key>
    <fk-table>Book</fk-table>
    <fk-column>authorID</fk-column>
    <fk-references>Author</fk-references>
  </foreign-key>
  <foreign-key>
    <fk-table>Book</fk-table>
    <fk-column>publisherID</fk-column>
    <fk-references>Publisher</fk-references>
  </foreign-key>
The schema.xml file specifies the database tables, columns, column datatypes, etc. It also specifies the foreign key relationships between tables. It is initially used to generate the database SQL schema file. The corresponding generated fragment is:
  create table Book (
    bookID integer not null
    ,title varchar(80) not null
    ,ISBN varchar(80) not null unique
    ,authorID integer not null
    ,publisherID integer not null
    ,status integer not null
    ,numCopies integer not null
    ,constraint Book_pk primary key(bookID)
  );

  alter table Book
    add constraint Book_authorID
    foreign key (authorID)
    references Author (authorID);

  alter table Book
    add constraint Book_publisherID
    foreign key (publisherID)
    references Publisher (publisherID);
In this example all table columns are listed directly in schema.xml.

Stereotypes

Unlike the generated SQL file, the schema.xml file does not need to directly list all the fields, datatypes, etc. of a table. In production systems many tables contain repetitive columns for purposes of audit trail, optimistic locking, etc. Instead of manually adding these fields to each table a stereotype may be used. This concept has been borrowed from UML, where it is represented in class diagrams as <<stereotype_name>>. Here we take advantage of our extensible XML grammar to attribute tables with desired stereotypes. An example would be to mark tables as <constant/> or <dynamic/>.

Dynamic tables would automatically have columns create_date, modification_date, and modification_count, used in optimistic locking, added. Database SQL trigger code to maintain these fields can be generated. Higher-level layers of generated code would automatically utilize these locking columns transparent to the developer.

Constant tables can produce generation-time or run-time warnings if they are modified. Later the <dynamic/> semantics could be extended to track column usage by adding created_by and modified_by columns, again automatically maintained by the higher layer code and again transparent to the developer.

The leverage provided by applying arbitrarily complex semantics to simple extensions to the grammar is a key advantage of this generation system. The equivalent in a UML-based generation system would be to apply a new custom stereotype <<dynamic>> to a class and extend the generator to implement the semantics.

Stereotypes leave the application developer free to concentrate on the important elements of the data design.

Basic SS Layer and Value Objects

The SS database access layer separates web pages from PEAR database access routines. The web pages communicate with the database by passing Value Objects. The SS layer is implemented as a separate PHP class wrapping each schema table. Default methods on the class accept or return Value Objects to add, update, delete, get, and getAll database rows. Independent of any customization a basic Value Object and SS layer is generated for each table in schema.xml.

The basic Value Object looks like this:

  class BookValue {
    // private member variables
    var $_bookID;
    var $_title;
    var $_ISBN;
    var $_authorID;
    var $_publisherID;
    var $_status;
    var $_numCopies;
 
    // empty constructor
    function BookValue(){
      ...
    }

    // ResultSet constructor
    function setFromRow($row){
      ...
    }

  // member variable getters and setters

  function getBookID() { return $this->_bookID; }
  function setBookID($bookID){ $this->_bookID = $bookID; }
    ...
  }
When you add or edit a record you need to construct or fetch a Value Object, then alter its contents and send it to the SS layer to be stored in the database.

The standard SS layer API looks like this:

  class BookSS {
    function getBookValue($bookID)
    // returns array of BookValue
    function getAllBookValue($orderBy)
    function add($value)
    function update($value)
    function delete($bookID)
  }

Extending the API

From schema.xml we create the SS layer and basic set of Value Objects. Production web pages often require more complex queries and Value Objects. Our generation system allows the Value Object, which defines the fields available for display, to be varied independently from the "where" portion of the custom query. A custom query can be used to return an auto-generated default Value Object, and an auto-generated get or getAll can be applied to a custom Value Object. Having them defined separately allows a custom Value Object to be defined once and then reused with many different queries. This is useful for the large number of web pages where the displayed table columns are identical but the specific query is different.

The extensions.xml file is used to define new Value Objects and add custom queries and methods to the SS layer.

Here are some extensions added to the Book table:

  <value-object name="BookWithNamesValue" base-table="Book">
    <add-column table="Author" column-name="name" />
    <add-column table="Author" column-name="penName" />
    <add-column table="Publisher" column-name="name" />
  </value-object>
This defines a new Value Object. It will contain all the columns of the base table plus the new columns added from the other tables. Generated SQL strings will automatically perform the joins necessary to pull in the other columns. get* and getAll* methods are generated for this new Value Object.
  <sql-query-method name="getAllByTitle" value-object="BookWithNamesValue" >
    <parameter name="title" />
    <where>Book1.title = ?</where>
  </sql-query-method>
This defines a new SQL query. It returns the previously defined new Value Object, and restricts its results using the specified SQL where-clause fragment.
  <custom-method name="updateStatusByPublisher" table="Book" return-type="void">
This defines a custom method. The actual implementation is placed in a hand-written PHP file, and is invoked by the SS layer.

Pages

The generator automatically produces test add, update, delete, and list web pages for all defined Value Objects, queries, and methods. Production web pages are specified in pages.xml. Supported page types are also add, update, delete, and list.

Here is an example list page for the Book table:

  <page name="BookList" type="list" label="Book List"
      value-object="BookWithNamesValue" order="Book1.title">
    <buttons>
      <button label="Add Book" target="BookAdd"/>
      <button label="List Books in Stores" target="StoreBookList"/>
    </buttons>
    <fields>
      <field name="update" label="Update" link="BookUpdate" virtual="true"/>
      <field name="delete" label="Delete" link="BookDelete" virtual="true"/>
      <field name="title" label="Title" link="BookView"/>
      <field name="ISBN" />
      <field name="author_Name" label="Author Name"/>
      <field name="publisher_Name" label="Publisher Name"/>
    </fields>
  </page>
Here is how the generated web page looks in the browser:

Note this page is using the BookWithNamesValue Value Object defined in extensions.xml. It automatically uses the SS method getAllBookWithNamesValue() (an extension is to use an alternate query method with the parameters read from the request).

The results are ordered by Book.title. The fields refer to the columns by name, or by table_Name in the case of columns added to the Value Object. Labels for all fields default to the column name but can be customized. Here customizations are specified directly; alternately they could be keys into a localization file indexed by user locale.

Hyperlinked buttons are supported both in the body of the table as well as separately. Different styles of hyperlinks (using icons, etc.) are one example of a feature easily added to the generator grammar based on page developer requests.

Here is the xml for an add page for the Book table:

  <page name="BookAdd" type="add" label ="Add Book"
     value-object="BookValue" success="BookList">
    <fields>
      <field name="title" label="Title"/>
      <field name="ISBN" />
      <field name="authorID" label="Author">
        <select table="Author" text="name" />
      </field>
      <field name="publisherID" label="Publisher">
        <select table="Publisher" text="name" />
      </field>
      <field name="status" label="Status"/>
      <field name="numCopies" label="Number of Copies"/>
    </fields>
  </page>
Here is the generated page in the browser:

Note HTML select tags (drop-down-lists) can be specified in the XML simply by listing the desired table and field name. The relevant SQL queries are automatically invoked at run-time. Because the generator has basic schema information available, field validation can be performed automatically:

Currently the system performs required (not null) and integer validation automatically. Single-column unique validation can also easily be added. Maximum text field widths are currently set from the schema.

Note the similarity between the pages.xml specification for a page and a functional specification. Both specify which fields are on a page in what order, which tables fields are taken from, which buttons are on the page, which pages they link to, and what labels are on the fields and buttons. This has several advantages:

The pages.xml specifications are so short because all the layout, etc. decisions have been moved elsewhere in the system. CSS (Cascading Style Sheets) allow the colors, fonts, etc. of HTML elements to be specified in a separate file. But basics of page layout currently need to remain in the HTML file.

By moving the high-level specification of a page to pages.xml the generator controls the placement of page elements. A simple change to the generator template file can alter the button layout, for instance:

Here we have altered the generator with a new layout style. The generator enforces the latest layout style standards so the page developers can concentrate on business functions.

While templating systems alone can accomplish some of these goals, templating combined with generation can create a much more powerful system. Generators can both utilize templating systems in their implementation of generated pages and utilize a templating pattern in XML specification of pages.

Pages in pages.xml can be parameterized and treated as sub-components of other pages. In this way a single page description can be varied and utilized in many different pages. Of course hierarchically composed pages can be constructed from a mixture of generated and hand-written subpages.

Development Style

Currently the generator presented here (and available for download) is a "toy" system. A significant amount of functionality would need to be added to complete a production web site. From my experience performing this extension is a feasible and productive method of developing a working system.

The amount and types of extensions necessary depend on the kind of web site being developed. The production web pages may be nearly suitable as-is for a simple administrative user interface. Customer-facing list pages will need paging mechanisms, editable fields, icons, localization, etc. Headers and footers providing logos, navigation menus (also generatable from a declarative description!), etc. should also be added. Mechanisms for attaching hand-written fragments of PHP code to generated pages will also be needed. These fragments use HTTP get/post and session state variables in preparing query parameters, calling business logic, and controlling navigation flow.

The grammar used in each of the XML files, and especially pages.xml, forms a unique, very high level, declarative language describing your specific application. As new requirements arise during iterative analysis and implementation the grammar is extended (new XML elements and attributes are added). The XML grammar remains concise because it is not trying to be a general-purpose language. The development team will typically be split between business analyst / page author / business logic developers and generator tool developers. The generator tool developers will continuously add new features based on analyst / page author requests. The generator and its grammar co-evolve with your understanding of your application domain. A release of the generator is done when the entire system is ready to ship.

And a big tip for team productivity, for both the analysts and the generator developer: maintain up-to-date DTDs (Document Type Definitions) or XML Schemas for your XML grammar. When anyone has a question about their XML input ask "does it validate?" While none of the Ruby XML APIs currently validate, it is simple to create a 15-line Python script and add it as a build target to your system. DTDs schema.dtd, extensions.dtd, pages.dtd, and samples.dtd have been included with the code. DTDs and XML Schemas allow you to declaratively specify your XML grammar and automatically validate input files against it using the XML parser. Otherwise you have to check for grammatical errors yourself procedurally inside your generator. DTDs and XML Schemas also provide exact documentation of your current grammar for the other developers.

Conclusions

There are many different ways to implement a code generator that generates complete application tiers. I have presented the advantages of a generator that has complete schema information and uses a custom-defined and continuously extended declarative XML grammar. Some advantages I have presented are: Finally, all of these and many other discussed advantages are possible because we have built our generator around a declarative grammar that focuses on WHAT, not HOW.

Special thanks to Jack Herrington for editorial suggestions.